Skip to content

[ENH]: add batch get version file paths method to Sysdb #4432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2025

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented May 3, 2025

Description of changes

Needed when garbage collecting a collection in a fork tree, because we need to read the version files of all other nodes in the tree.

Test plan

How are these changes tested?

Added a test.

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

n/a

Copy link

github-actions bot commented May 3, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from ab87d31 to 7e6889e Compare May 5, 2025 19:36
Copy link
Contributor Author

codetheweb commented May 5, 2025

@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from 4d29f0d to 0d136b9 Compare May 8, 2025 23:30
Copy link
Contributor

propel-code-bot bot commented May 8, 2025

Add Batch Get Collection Version File Paths to SysDB API

This PR introduces a new batch API to retrieve version file paths for multiple collections in SysDB, which is necessary for operations like garbage collecting collections in fork trees. The change spans both Rust and Go components, defining new proto messages and service endpoints, implementing the server, coordinator, and DAO methods, providing tests, and adding error handling for batch retrieval.

Key Changes:
• New proto messages and SysDB RPC method (BatchGetCollectionVersionFilePaths) added to idl/chromadb/proto/coordinator.proto
• Rust Sysdb adds batch_get_collection_version_file_paths method, including error enum and wiring through GRPC and test implementations
• Go code: Implements coordinator, service, DAO, and model interfaces for batch retrieval of collection version file paths
• Updates Go mocks, new unit/integration test in coordinator_test.go
• Integration with both test and actual data paths for version files in collection data models

Affected Areas:
• SysDB API (Rust & Go)
• Protobuf interface definition
• Rust sysdb.rs and test_sysdb.rs
• Go coordinator, catalog, DAO, mock, test layers

Potential Impact:

Functionality: Enables efficient retrieval of version file paths for batched collection IDs, improving garbage collection workflows. No breaking changes to existing APIs.

Performance: Batch fetch is more efficient for clients. Current Go implementation uses IN queries, which may have DB/backend limitations for very large batches (see code TODO).

Security: No direct security impact; assumes existing API authentication applies.

Scalability: Scales better than single-fetch-per-collection but might bottleneck on database IN queries for very large batch sizes; further optimization noted as a TODO.

Review Focus:
• Batch query implementation in Go DAO (possible IN clause scalability)
• Error handling and proto/Rust error propagation
• Interface contract between proto, Go, and Rust layers for batch calls
• Correctness of test coverage and edge cases

Testing Needed

• Verify the batch API returns correct file paths for varying batch sizes and non-existent collection IDs
• Run the new Go API test (TestBatchGetCollectionVersionFilePaths) and associated Rust test
• Test error handling paths (e.g., missing collections)

Code Quality Assessment

go/pkg/sysdb/metastore/db/dao/collection.go: Batch query is direct, uses standard GORM patterns but flagged by reviewer 'todo: make more efficient'.

go/pkg/sysdb/metastore/db/dbmodel/mocks/ICollectionDb.go: Regenerated, OK.

go/pkg/sysdb/coordinator/coordinator_test.go: New test covers positive path; well integrated.

rust/sysdb/src/sysdb.rs: New method added cleanly, uses appropriate Rust error handling and enum patterns. Some branches are TODO (Sqlite).

rust/sysdb/src/test_sysdb.rs: Extended with set/get functionality and error coverage. Logic is straightforward.

go/pkg/sysdb/coordinator/table_catalog.go: Implements batch method directly; review notes touches on exposure of proto types.

Best Practices

API Design:
• Batched API style for efficiency
• Consistent error return types

Testing:
• Covers new API with integration test

Error Handling:
• Custom error type, clear propagation

Separation Of Concerns:
• Some mixing of proto/internal types in Go; future refactor suggested

Potential Issues

• Potential performance bottleneck for large batch sizes in the DAO's SQL IN queries
• Inconsistent exposure of proto types to business logic (see code review discussion)
• Sqlite implementation in Rust is left as todo!()

This summary was automatically generated by @propel-code-bot

@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch 3 times, most recently from 93be17d to ea366a5 Compare May 15, 2025 20:11
@codetheweb codetheweb changed the base branch from main to feat-return-collection-with-version-file-lineage-file-root-collection-2 May 15, 2025 20:11
@codetheweb codetheweb force-pushed the feat-return-collection-with-version-file-lineage-file-root-collection-2 branch from b39fd56 to 8a3421d Compare May 15, 2025 22:38
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from ea366a5 to edf6743 Compare May 15, 2025 22:38
@codetheweb codetheweb force-pushed the feat-return-collection-with-version-file-lineage-file-root-collection-2 branch from 8a3421d to 972427e Compare May 15, 2025 23:34
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch 4 times, most recently from 505c4ae to 3ece42b Compare May 16, 2025 16:41
@codetheweb codetheweb force-pushed the feat-return-collection-with-version-file-lineage-file-root-collection-2 branch from 972427e to bab5b68 Compare May 19, 2025 23:38
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from 3ece42b to 5801be6 Compare May 19, 2025 23:38
@codetheweb codetheweb requested a review from sanketkedia May 19, 2025 23:45
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from 5801be6 to 90ac80a Compare May 19, 2025 23:45
@codetheweb codetheweb force-pushed the feat-return-collection-with-version-file-lineage-file-root-collection-2 branch from bab5b68 to 5f7fa4a Compare May 20, 2025 18:02
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from 90ac80a to 2e26e8a Compare May 20, 2025 18:02
@codetheweb codetheweb changed the base branch from feat-return-collection-with-version-file-lineage-file-root-collection-2 to graphite-base/4432 May 20, 2025 18:42
@codetheweb codetheweb force-pushed the graphite-base/4432 branch from 5f7fa4a to 71c953d Compare May 20, 2025 18:42
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from 2e26e8a to 7344e5c Compare May 20, 2025 18:42
@graphite-app graphite-app bot changed the base branch from graphite-base/4432 to main May 20, 2025 18:42
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch 2 times, most recently from c3a7502 to 2425e04 Compare May 20, 2025 19:41
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from 2425e04 to a09392f Compare May 21, 2025 16:45
@codetheweb codetheweb force-pushed the feat-sysdb-batch-get-version-file-paths-method branch from a09392f to 1af74ec Compare May 22, 2025 23:23
@codetheweb codetheweb merged commit 2f4ce46 into main May 22, 2025
72 checks passed
Copy link
Contributor Author

Merge activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants